Autocorrelation and Stationarity

source("../code/init/chapter_start.R")
knitr::read_chunk('../code/chapter/02_stationarity_autocorrelation.R')

"One of the first things taught in introductory statistics textbooks is that correlation is not causation. It is also one of the first things forgotten.", Thomas Sowell

In this chapter we will discuss and formalize how knowledge about $X_{t-1}$ (or more generally $\Omega_t$) can provide us with some information about the properties of $X_t$. In particular, we will consider the correlation (or covariance) of $(X_t)$ at different times such as $\corr \left(X_t, X_{t+h}\right)$. This "form" of correlation (covariance) is called the autocorrelation (autocovariance) and is a very useful tool in time series analysis. However, if we do not assume that a time series is characterized by a certain form of "stability", it would be rather difficult to estimate $\corr \left(X_t, X_{t+h}\right)$ as this quantity would depend on both $t$ and $h$ leading to more parameters to estimate than observations available. Therefore, the concept of stationarity is convenient in this context as it allows (among other things) to assume that

[\corr \left(X_t, X_{t+h}\right) = \corr \left(X_{t+j}, X_{t+h+j}\right), \;\;\; \text{for all $j$},]

implying that the autocorrelation (or autocovariance) is only a function of the lag between observations (rather than time itself). These two concepts (i.e. autocorrelation and stationarity) will be discussed in this chapter. Before moving on, it is helpful to remember that correlation (or autocorrelation) is only appropriate to measure a very specific kind of dependence, i.e. the linear dependence. There are many other forms of dependence as illustrated in the bottom panels of the graph below, which all have a (true) zero correlation:

knitr::include_graphics("../images/corr_example.png")

Several other metrics have been introduced in the literature to assess the degree of "dependence" of two random variables however this goes beyond the material discussed in this chapter.

The Autocorrelation and Autocovariance Functions

Definitions

The autocovariance function of a series $(X_t)$ is defined as

[{\gamma_x}\left( {t,t+h} \right) = \cov \left( {{X_t},{X_{t+h}}} \right),]

where the definition of covariance is given by:

[ \cov \left( {{X_t},{X_{t+h}}} \right) = \mathbb{E}\left[ {{X_t}{X_{t+h}}} \right] - \mathbb{E}\left[ {{X_t}} \right]\mathbb{E}\left[ {{X_{t+h}}} \right]. ]

Similarly, the above expectations are defined to be:

[\begin{aligned} \mathbb{E}\left[ {{X_t}} \right] &= \int\limits_{ - \infty }^\infty {x \cdot {f_t}\left( x \right)dx}, \ \mathbb{E}\left[ {{X_t}{X_{t+h}}} \right] &= \int\limits_{ - \infty }^\infty {\int\limits_{ - \infty }^\infty {{x_1}{x_2} \cdot f_{t,t+h}\left( {{x_1},{x_2}} \right)d{x_1}d{x_2}} } , \end{aligned} ]

where ${f_t}\left( x \right)$ and $f_{t,t+h}\left( {{x_1},{x_2}} \right)$ denote, respectively, the density of $X_t$ and the joint density of the pair $(X_t, X_{t+h})$. Since we generally consider stochastic processes with constant zero mean we often have

[{\gamma_x}\left( {t,t+h} \right) = \mathbb{E}\left[X_t X_{t+h} \right]. ]

In addition, we normally drop the subscript referring to the time series (i.e. $x$ in this case) if it is clear from the context which time series the autocovariance refers to. For example, we generally use ${\gamma}\left( {t,t+h} \right)$ instead of ${\gamma_x}\left( {t,t+h} \right)$. Moreover, the notation is even further simplified when the covariance of $X_t$ and $X_{t+h}$ is the same as that of $X_{t+j}$ and $X_{t+h+j}$ (for all $j$), i.e. the covariance depends only on the time between observations and not on the specific time $t$. This is an important property called stationarity, which will be discuss in the next section. In this case, we simply use to following notation: [\gamma \left( {h} \right) = \cov \left( X_t , X_{t+h} \right). ]

This notation will generally be used throughout the text and implicitly assume certain properties (i.e. stationarity) on the process $(X_t)$. Several remarks can be made on the autocovariance:

  1. The autocovariance function is symmetric. That is, ${\gamma}\left( {h} \right) = {\gamma}\left( -h \right)$ since $\cov \left( {{X_t},{X_{t+h}}} \right) = \cov \left( X_{t+h},X_{t} \right)$.
  2. The autocovariance function "contains" the variance of the process as $\var \left( X_{t} \right) = {\gamma}\left( 0 \right)$.
  3. We have that $|\gamma(h)| \leq \gamma(0)$ for all $h$. The proof of this inequality is direct and follows from the Cauchy-Schwarz inequality, i.e. [ \begin{aligned} \left(|\gamma(h)| \right)^2 &= \gamma(h)^2 = \left(\mathbb{E}\left[\left(X_t - \mathbb{E}[X_t] \right)\left(X_{t+h} - \mathbb{E}[X_{t+h}] \right)\right]\right)^2\ &\leq \mathbb{E}\left[\left(X_t - \mathbb{E}[X_t] \right)^2 \right] \mathbb{E}\left[\left(X_{t+h} - \mathbb{E}[X_{t+h}] \right)^2 \right] = \gamma(0)^2. \end{aligned} ]
  4. Just as any covariance, ${\gamma}\left( {h} \right)$ is "scale dependent" since ${\gamma}\left( {h} \right) \in \real$, or $-\infty \le {\gamma}\left( {h} \right) \le +\infty$. We therefore have:
    • if $\left| {\gamma}\left( {h} \right) \right|$ is "close" to zero, then $X_t$ and $X_{t+h}$ are "weakly" (linearly) dependent;
    • if $\left| {\gamma}\left( {h} \right) \right|$ is "far" from zero, then the two random variable present a "strong" (linear) dependence. However it is generally difficult to asses what "close" and "far" from zero means in this case.
  5. ${\gamma}\left( {h} \right)=0$ does not imply that $X_t$ and $X_{t+h}$ are independent but simply $X_t$ and $X_{t+h}$ are uncorrelated. The independence is only implied by ${\gamma}\left( {h} \right)=0$ in the jointly Gaussian case.

As hinted in the introduction, an important related statistic is the correlation of $X_t$ with $X_{t+h}$ or autocorrelation, which is defined as

$$\rho \left( h \right) = \corr\left( {{X_t},{X_{t + h}}} \right) = \frac{{\cov\left( {{X_t},{X_{t + h}}} \right)}}{{{\sigma {{X_t}}}{\sigma {{X_{t + h}}}}}} = \frac{\gamma(h) }{\gamma(0)}.$$

Similarly to $\gamma(h)$, it is important to note that the above notation implies that the autocorrelation function is only a function of the lag $h$ between observations. Thus, autocovariances and autocorrelations are one possible way to describe the joint distribution of a time series. Indeed, the correlation of $X_t$ with $X_{t+1}$ is an obvious measure of how persistent a time series is.

Remember that just as with any correlation:

  1. $\rho \left( h \right)$ is "scale free" so it is much easier to interpret than $\gamma(h)$.
  2. $|\rho \left( h \right)| \leq 1$ since $|\gamma(h)| \leq \gamma(0)$.
  3. Causation and correlation are two very different things!

A Fundamental Representation

Autocovariances and autocorrelations also turn out to be a very useful tool because they are one of the fundamental representations of time series. Indeed, if we consider a zero mean normally distributed process, it is clear that its joint distribution is fully characterized by the autocariances $\mathbb{E}[X_t X_{t+h}]$ (since the joint probability density only depends of these covariances). Once we know the autocovariances we know everything there is to know about the process and therefore: if two processes have the same autocovariance function, then they are the same process.

Admissible Autocorrelation Functions

Since the autocorrelation is related to a fundamental representation of time series, it implies that one might be able to define a stochastic process by picking a set of autocorrelation values (assuming for example that $\var(X_t) = 1$). However, it turns out that not every collection of numbers, say ${\rho_1, \rho_2, ...}$, can represent the autocorrelation of a process. Indeed, two conditions are required to ensure the validity of an autocorrelation sequence:

  1. $\operatorname{max}_j \; | \rho_j| \leq 1$.
  2. $\var \left[\sum_{j = 0}^\infty \alpha_j X_{t-j} \right] \geq 0 \;$ for all ${\alpha_0, \alpha_1, ...}$.

The first condition is obvious and simply reflects the fact that $|\rho \left( h \right)| \leq 1$ but the second is far more difficult to verify. To further our understanding of the latter we let $\alpha_j = 0$ for $j > 1$, then condition 2 implies that

[\var \left[ \alpha_0 X_{t} + \alpha_1 X_{t-1} \right] = \gamma_0 \begin{bmatrix} \alpha_0 & \alpha_1 \end{bmatrix} \begin{bmatrix} 1 & \rho_1\ \rho_1 & 1 \end{bmatrix} \begin{bmatrix} \alpha_0 \ \alpha_1 \end{bmatrix} \geq 0. ]

Thus, the matrix

[ \boldsymbol{A}_1 = \begin{bmatrix} 1 & \rho_1\ \rho_1 & 1 \end{bmatrix} ]

must be positive semi-definite. Taking the determinant we have

[\operatorname{det} \left(\boldsymbol{A}_1\right) = 1 - \rho_1^2 ]

implying that the condition $|\rho_1| < 1$ must be respected. Now, let $\alpha_j = 0$ for $j > 2$, then we must verify that:

[\var \left[ \alpha_0 X_{t} + \alpha_1 X_{t-1} + \alpha_2 X_{t-2} \right] = \gamma_0 \begin{bmatrix} \alpha_0 & \alpha_1 &\alpha_2 \end{bmatrix} \begin{bmatrix} 1 & \rho_1 & \rho_2\ \rho_1 & 1 & \rho_1 \ \rho_2 & \rho_1 & 1 \end{bmatrix} \begin{bmatrix} \alpha_0 \ \alpha_1 \ \alpha_2 \end{bmatrix} \geq 0. ]

Again, this implies that the matrix

[ \boldsymbol{A}_2 = \begin{bmatrix} 1 & \rho_1 & \rho_2\ \rho_1 & 1 & \rho_1 \ \rho_2 & \rho_1 & 1 \end{bmatrix} ]

must be positive semi-definite and it is easy to verify that

[\operatorname{det} \left(\boldsymbol{A}_2\right) = \left(1 - \rho_2 \right)\left(- 2 \rho_1^2 + \rho_2 + 1\right). ]

Thus, this implies that $|\rho_2| < 1$ as well as

[\begin{aligned} &- 2 \rho_1^2 + \rho_2 + 1 \geq 0 \Rightarrow 1 > \rho_2 \geq 2 \rho_1^2 - 1 \ &\Rightarrow 1 - \rho_1^2 > \rho_2 - \rho_1^2 \geq -(1 - \rho_1^2)\ &\Rightarrow 1 > \frac{\rho_2 - \rho_1^2 }{1 - \rho_1^2} \geq -1. \end{aligned}]

Therefore, $\rho_1$ and $\rho_2$ must lie in a parabolic shaped region defined by the above inequalities as illustrated in Figure \@ref(fig:admissibility).


From our derivation it is clear that the restrictions on the autocorrelation are very complicated thereby justifying the need for other forms of fundamental representation which we will explore later in this text. Before moving on to the estimation of the autocorrelation and autocovariance functions, we must first discuss the stationarity of $(X_t)$, which will provide a convenient framework in which $\gamma(h)$ and $\rho(h)$ can be used (rather that $\gamma(t,t+h)$ for example) and (easily) estimated.

Stationarity

Definitions

There are two kinds of stationarity that are commonly used. They are defined below:

These types of stationarity are not equivalent and the presence of one kind of stationarity does not imply the other. That is, a time series can be strongly stationary but not weakly stationary and vice versa. In some cases, a time series can be both strongly and weakly stationary and this is occurs, for example, in the (jointly) Gaussian case. Stationarity of $(X_t)$ matters because it provides the framework in which averaging dependent data makes sense thereby allowing to easily estimate quantities such as the autocorrelation function.

Several remarks and comments can be made on these definitions:

Assessing Weak Stationarity of Time Series Models

It is important to understand how to verify if a postulated model is (weakly) stationary. In order to do so, we must ensure that our model satisfies the following three properties:

  1. $\mathbb{E}\left[X_t \right] = \mu_t = \mu < \infty$,
  2. $\var\left[X_t \right] = \sigma^2_t = \sigma^2 < \infty$,
  3. $\cov\left(X_t, X_{t+h} \right) = \gamma \left(h\right)$.

In the following examples we evaluate the stationarity of the processes introduced in Section \@ref(basic-time-series-models).

Example: Gaussian White Noise It is easy to verify that this process is stationary. Indeed, we have:

  1. $\mathbb{E}\left[ {{X_t}} \right] = 0$,
  2. $\gamma(0) = \sigma^2 < \infty$,
  3. $\gamma(h) = 0$ for $|h| > 0$.

Example: Random Walk To evaluate the stationarity of this process we first derive its properties:

  1. We begin by calculating the expectation of the process [ \mathbb{E}\left[ {{X_t}} \right] = \mathbb{E}\left[ {{X_{t - 1}} + {W_t}} \right] = \mathbb{E}\left[ {\sum\limits_{i = 1}^t {{W_i}} + {X_0}} \right] = \mathbb{E}\left[ {\sum\limits_{i = 1}^t {{W_i}} } \right] + {c} = c. ] Observe that the mean obtained is constant since it depends only on the value of the first term in the sequence.

  2. Next, after finding the mean to be constant, we calculate the variance to check stationarity: [\begin{aligned} \var\left( {{X_t}} \right) &= \var\left( {\sum\limits_{i = 1}^t {{W_t}} + {X_0}} \right) = \var\left( {\sum\limits_{i = 1}^t {{W_i}} } \right) + \underbrace {\var\left( {{X_0}} \right)}{= 0} \ &= \sum\limits{i = 1}^t {\var\left( {{W_i}} \right)} = t \sigma_w^2, \end{aligned}] where $\sigma_w^2 = \var(W_t)$. Therefore, the variance depends on time $t$ contradicting our second property. Moreover, we have: [\mathop {\lim }\limits_{t \to \infty } \; \var\left(X_t\right) = \infty.] This process is therefore not weakly stationary.

  3. Regarding the autocovariance of a random walk we have: [\begin{aligned} \gamma \left( h \right) &= \cov\left( {{X_t},{X_{t + h}}} \right) = \cov\left( {\sum\limits_{i = 1}^t {{W_i}} ,\sum\limits_{j = 1}^{t + h} {{W_j}} } \right) = \cov\left( {\sum\limits_{i = 1}^t {{W_i}} ,\sum\limits_{j = 1}^t {{W_j}} } \right)\ &= \min \left( {t,t + h} \right)\sigma _w^2 = \left( {t + \min \left( {0,h} \right)} \right)\sigma _w^2, \end{aligned} ] which further illustrates the non-stationarity of this process.

Moreover, the autocorrelation of this process is given by

[\rho (h) = \frac{t + \min \left( {0,h} \right)}{\sqrt{t}\sqrt{t+h}},]

implying (for a fixed $h$) that

[\mathop {\lim }\limits_{t \to \infty } \; \rho(h) = 1.]

In the following simulated example, we illustrate the non-stationary feature of such a process:


In the plot, two hundred simulated random walks are plotted along with the theoretical 95\% confidence intervals (red-dashed lines). The relationship between time and variance can clearly be observed (i.e. the variance of the process increases with the time).

Example: MA(1) Similarly to our previous examples, we attempt to verify the stationary properties for the MA(1) model defined in Section \@ref(ma1):

  1. [ \mathbb{E}\left[ {{X_t}} \right] = \mathbb{E}\left[ {{\theta_1}{W_{t - 1}} + {W_t}} \right] = {\theta_1} \mathbb{E} \left[ {{W_{t - 1}}} \right] + \mathbb{E}\left[ {{W_t}} \right] = 0. ]
  2. [\var \left( {{X_t}} \right) = \theta_1^2 \var \left( W_{t - 1}\right) + \var \left( W_{t}\right) = \left(1 + \theta^2 \right) \sigma^2_w.]
  3. Regarding the autocovariance, we have [\begin{aligned} \cov\left( {{X_t},{X_{t + h}}} \right) &= \mathbb{E}\left[ {\left( {{X_t} - \mathbb{E}\left[ {{X_t}} \right]} \right)\left( {{X_{t + h}} - \mathbb{E}\left[ {{X_{t + h}}} \right]} \right)} \right] = \mathbb{E}\left[ {{X_t}{X_{t + h}}} \right] \ &= \mathbb{E}\left[ {\left( {{\theta}{W_{t - 1}} + {W_t}} \right)\left( {{\theta }{W_{t + h - 1}} + {W_{t + h}}} \right)} \right] \ &= \mathbb{E}\left[ {\theta^2{W_{t - 1}}{W_{t + h - 1}} + \theta {W_t}{W_{t + h}} + {\theta}{W_{t - 1}}{W_{t + h}} + {W_t}{W_{t + h}}} \right]. \ \end{aligned} ] It is easy to see that $\mathbb{E}\left[ {{W_t}{W_{t + h}}} \right] = {\boldsymbol{1}{\left{ {h = 0} \right}}}\sigma _w^2$ and therefore, we obtain [\cov \left( {{X_t},{X{t + h}}} \right) = \left( {\theta^2{ \boldsymbol{1}{\left{ {h = 0} \right}}} + {\theta}{\boldsymbol{1}{\left{ {h = 1} \right}}} + {\theta}{\boldsymbol{1}{\left{ {h = - 1} \right}}} + {\boldsymbol{1}{\left{ {h = 0} \right}}}} \right)\sigma _w^2] implying the following autocovariance function: [\gamma \left( h \right) = \left{ {\begin{array}{{20}{c}} {\left( {\theta^2 + 1} \right)\sigma _w^2}&{h = 0} \ {{\theta}\sigma _w^2}&{\left| h \right| = 1} \ 0&{\left| h \right| > 1} \end{array}} \right. .] Therefore, an MA(1) process is weakly stationary since both the mean and variance are constant over time and its covariance function is only a function of the lag $h$. Finally, we can easily obtain the autocorrelation for this process, which is given by $$\rho \left( h \right) = \left{ {\begin{array}{{20}{c}} 1&{h = 0} \ {\frac{{{\theta}\sigma _w^2}}{{\left( {\theta^2 + 1} \right)\sigma _w^2}} = \frac{{{\theta}}}{{\theta^2 + 1}}}&{\left| h \right| = 1} \ 0&{\left| h \right| > 1} \end{array}} \right. .$$ Interestingly, we can note that $|\rho(1)| \leq 0.5$.

Example AR(1) As another example, we shall verify the stationary properties for the AR(1) model defined in Section \@ref(ar1).

Using the backsubstitution technique, we can rearrange an AR(1) process so that it is written in a more compact form, i.e.

[\begin{aligned} {X_t} & = {\phi }{X_{t - 1}} + {W_t} = \phi \left[ {\phi {X_{t - 2}} + {W_{t - 1}}} \right] + {W_t} = {\phi ^2}{X_{t - 2}} + \phi {W_{t - 1}} + {W_t} \ & \vdots \ & = {\phi ^k}{X_{t-k}} + \sum\limits_{j = 0}^{k - 1} {{\phi ^j}{W_{t - j}}} . \end{aligned} ]

By taking the limit in $k$ (which is perfectly valid as we assume $t \in \mathbb{Z}$) and assuming $|\phi|<1$, we obtain

[\begin{aligned} X_t = \mathop {\lim }\limits_{k \to \infty} \; {X_t} = \sum\limits_{j = 0}^{\infty} {{\phi ^j}{W_{t - j}}} \end{aligned} ]

and therefore such process can be interpreted as a linear combination of the white noise $(W_t)$ and corresponds (as we will later on) to an MA($\infty$). In addition, the requirement $\left| \phi \right| < 1$ turns out to be extremely useful as the above formula is related to Geometric series which would diverge if $\phi \geq 1$. Indeed, remember that an infinite (converging) Geometric series is given by

[\sum\limits_{k = 0}^\infty \, a{{r^k}} = \frac{a}{{1 - r}}, \; {\text{ if }}\left| r \right| < 1.]

With this setup, we demonstrate how crucial this property is by calculating each of the requirements of a stationary process.

  1. First, we will check to see if the mean is stationary. In this case, we opt to use limits to derive the expectation [\begin{aligned} \mathbb{E}\left[ {{X_t}} \right] &= \mathop {\lim }\limits_{k \to \infty } \mathbb{E}\left[ {{\phi^k}{X_{t-k}} + \sum\limits_{j = 0}^{k - 1} {\phi^j{W_{t - j}}} } \right] = \mathop {\lim }\limits_{k \to \infty } \, \underbrace {{\phi ^k}{\mathbb{E}[X_{t-k}]}}{= 0} + \mathop {\lim }\limits{k \to \infty } \, \sum\limits_{j = 0}^{k - 1} {\phi^j\underbrace {\mathbb{E}\left[ {{W_{t - j}}} \right]}_{ = 0}} = 0. \end{aligned} ] As expected, the mean is zero and, hence, the first criteria for weak stationarity is satisfied.
  2. Next, we opt to determine the variance of the process [\begin{aligned} \var\left( {{X_t}} \right) &= \mathop {\lim }\limits_{k \to \infty } \var\left( {{\phi^k}{X_{t-k}} + \sum\limits_{j = 0}^{k - 1} {\phi^j{W_{t - j}}} } \right) = \mathop {\lim }\limits_{k \to \infty } \sum\limits_{j = 0}^{k - 1} {\phi ^{2j} \var\left( {{W_{t - j}}} \right)} \ &= \mathop {\lim }\limits_{k \to \infty } \sum\limits_{j = 0}^{k - 1} \sigma W^2 \, {\phi ^{2j}} =
    \underbrace {\frac{\sigma _W^2}{{1 - {\phi ^2}}}.}
    {\begin{subarray}{l} {\text{Geom. Serie}} \end{subarray}} \end{aligned} ] Once again, the above result only holds because we are able to use the geometric series convergence as a result of $\left| \phi \right| < 1$.
  3. Finally, we consider the autocovariance of an AR(1). For $h > 0$, we have [\gamma \left( h \right) = \cov\left( {{X_t},{X_{t + h}}} \right) = \phi \cov\left( {{X_t},{X_{t + h - 1}}} \right) = \phi \, \gamma \left( h-1 \right).] Therefore, we using the symmetry of the autocovariance we have that [\gamma \left( h \right) = |h| \, \gamma(0).]

Both the mean and variance do not depend on time in addition the autocovariance function can be viewed as function dependent on only lags and, thus, the AR(1) process is weakly stationary if $\left| \phi \right| < 1$. Lastly, we can obtain the autocorrelation for this process. Indeed, for $h > 0$, we have

[\rho \left( h \right) = \frac{{\gamma \left( h \right)}}{{\gamma \left( 0 \right)}} = \frac{{\phi \gamma \left( {h - 1} \right)}}{{\gamma \left( 0 \right)}} = \phi \rho \left( {h - 1} \right).]

After fully simplifying, we obtain

[\rho \left( h \right) = {\phi^{|h|}}.]

Thus, the autocorrelation function for an AR(1) exhibits a geometric decay. As in, the smaller the $|\phi|$, the faster the autocorrelation reaches zero. If $|\phi|$ is close to 1, then the decay rate is slower.

Estimation of the Mean Function

If a time series is stationary, the mean function is constant and a possible estimator of this quantity is given by

[\bar{X} = {\frac{1}{n}\sum\limits_{t = 1}^n {{X_t}} }.]

This estimator is clearly unbiased and has the following variance:

[\begin{aligned} \var \left( {\bar X} \right) &= \var \left( {\frac{1}{n}\sum\limits_{t = 1}^n {{X_t}} } \right) \ &= \frac{1}{{{n^2}}}\var \left( {{{\left[ {\begin{array}{{20}{c}} 1& \cdots &1 \end{array}} \right]}_{1 \times n}}{{\left[ {\begin{array}{{20}{c}} {{X_1}} \ \vdots \ {{X_n}} \end{array}} \right]}{n \times 1}}} \right) \ &= \frac{1}{{{n^2}}}{\left[ {\begin{array}{{20}{c}} 1& \cdots &1 \end{array}} \right]_{1 \times n}}\left[ {\begin{array}{{20}{c}} {\gamma \left( 0 \right)}&{\gamma \left( 1 \right)}& \cdots &{\gamma \left( {n - 1} \right)} \ {\gamma \left( 1 \right)}&{\gamma \left( 0 \right)}&{}& \vdots \ \vdots &{}& \ddots & \vdots \ {\gamma \left( {n - 1} \right)}& \cdots & \cdots &{\gamma \left( 0 \right)} \end{array}} \right]{n \times n}{\left[ {\begin{array}{*{20}{c}} 1 \ \vdots \ 1 \end{array}} \right]{n \times 1}} \ &= \frac{1}{{{n^2}}}\left( {n\gamma \left( 0 \right) + 2\left( {n - 1} \right)\gamma \left( 1 \right) + 2\left( {n - 2} \right)\gamma \left( 2 \right) + \cdots + 2\gamma \left( {n - 1} \right)} \right) \ &= \frac{1}{n}\sum\limits{h = - n}^n {\left( {1 - \frac{{\left| h \right|}}{n}} \right)\gamma \left( h \right)} \ \end{aligned}. ]

In the white noise case, the above formula reduces to the usual $\var \left( {\bar X} \right) = \var(X_t)/n$.

Example: For an AR(1) we have $\gamma(h) = \phi^h \sigma_w^2 \left(1 - \phi^2\right)^2$, therefore, we obtain (after some computations):

[ \var \left( {\bar X} \right) = \frac{\sigma_w^2 \left( n - 2\phi - n \phi^2 + 2 \phi^{n + 1}\right)}{n^2\left(1-\phi^2\right)\left(1-\phi\right)^2}.]

Unfortunately, deriving such an exact formula is often difficult when considering more complex models. However, asymptotic approximations are often employed to simplify the calculation. For example, in our case we have

[\mathop {\lim }\limits_{n \to \infty } \; n \var \left( {\bar X} \right) = \frac{\sigma_w^2}{\left(1-\phi\right)^2},]

providing the following approximate formula:

[\var \left( {\bar X} \right) \approx \frac{\sigma_w^2}{n \left(1-\phi\right)^2}.]

Alternatively, simulation methods can also be employed. The figure generated by the following code compares these three methods:


Sample Autocovariance and Autocorrelation Functions

A natural estimator of the autocovariance function is given by:

[\hat \gamma \left( h \right) = \frac{1}{T}\sum\limits_{t = 1}^{T - h} {\left( {{X_t} - \bar X} \right)\left( {{X_{t + h}} - \bar X} \right)} ]

leading the following "plug-in" estimator of the autocorrelation function

[\hat \rho \left( h \right) = \frac{{\hat \gamma \left( h \right)}}{{\hat \gamma \left( 0 \right)}}.]

A graphical representation of the autocorrelation function is often the first step for any time series analysis (assuming the process to be stationary). Consider the following simulated example:


In this example, the true autocorrelation is equal to zero at any lag $h \neq 0$ but obviously the estimated autocorrelations are random variables and are not equal to their true values. It would therefore be usefull to have some knowledge about the variability of the sample autocorrelations (under some conditions) to assess whether the data comes from a completely random series or presents some significant correlation at some lags. The following result provides an asymptotic solution to this problem:

Theorem: If $X_t$ is a strong white noise with finite fourth moment, then $\hat{\rho}(h)$ is approximately normally distributed with mean $0$ and variance $n^{-1}$ for all fixed $h$.

The proof of this Theorem is given in Appendix \@ref(appendix-a).

Using this result, we now have an approximate method to assess whether peaks in the sample autocorrelation are significant by determining whether the observed peak lies outside the interval $\pm 2/\sqrt{T}$ (i.e. an approximate 95% confidence interval). Returning to our previous example:


It can now be observed that most peaks lie within the interval $\pm 2/\sqrt{T}$ suggesting that the true data generating process is completely random (in the linear sense).

Unfortunately, this method is asymptotic (it relies on the central limit theorem) and there are no "exact" tools that can be used in this case. In the simulation study below we consider the "quality" of this result for $h = 3$ considering different sample sizes:


It can clearly be observed that the asymptotic approximation is quite poor when $T = 5$ but as the sample size increases the approximation improves and is very close when, for example, $T = 300$.

Joint Stationarity

Two time series, say $\left(X_t \right)$ and $\left(Y_t\right)$, are said to be jointly stationary if they are each stationary, and the cross-covariance function

[{\gamma {XY}}\left( {t,t + h} \right) = \cov\left( {{X_t},{Y{t + h}}} \right) = {\gamma _{XY}}\left( h \right)]

is a function only of lag $h$.

The cross-correlation function for jointly stationary times can be expressed as:

[{\rho {XY}}\left( {t,t + h} \right) = \frac{{{\gamma {XY}}\left( {t,t + h} \right)}}{{{\sigma {{X_t}}}{\sigma {{Y_{t + h}}}}}} = \frac{{{\gamma {XY}}\left( h \right)}}{{{\sigma {{X_t}}}{\sigma {{Y{t + h}}}}}} = {\rho _{XY}}\left( h \right)]

These ideas can extended beyond the bivariate case to a general multivariate setting.

Robustness Issues

The data generating process delivers a theoretical autocorrelation (autocovariance) function that, as explained in the previous section, can then be estimated through the sample autocorrelation (autocovariance) functions. However, in practice, the sample is often issued from a data generating process that is "close" to the true one, meaning that the sample suffers from some form of small contamination. This contamination is typically represented by a small amount of extreme observations that are called "outliers" that come from a process that is different from the true data generating process.

The fact that the sample can suffer from outliers implies that the standard estimation of the autocorrelation (autocovariance) functions through the sample functions can be highly biased. The standard estimators presented in the previous section are therefore not "robust" and can behave badly when the sample suffers from contamination. The following simulated example highlights how the performance of these estimators can deteriorate if the sample is contaminated:


The boxplots in each figure show how the standard autocorrelation estimator is centred around the true value (red line) when the sample is not contaminated (left boxplot) while it is considerably biased when the sample is contaminated (right boxplot), especially at the smallest lags. Indeed, it can be seen how the boxplots under contamination are often close to zero indicating that it does not detect much dependence in the data although it should. This is a known result in robustness, more specifically that outliers in the data can break the dependence structure and make it more difficult for the latter to be detected.

In order to limit this problematic, different robust estimators exist for time series problems allowing to reduce the impact of contamination on the estimation procedure. Among these estimators there are a few that estimate the autocorrelation (autocovariance) functions in a robust manner. One of these estimators is provided in the robacf() function in the "robcor" package and the following simulated example shows how it limits the bias due to contamination:


The robust estimator remains close to the true value represented by the red line in the boxplots as opposed to the standard estimator. It can also be observed that to reduce the bias induced by contamination in the sample, robust estimators pay a certain price in terms of efficiency as highlighted by the boxplots that show more variability compared to those of the standard estimator.

Let us consider the issue of robustness on a real data set coming from the domain of hydrology. The data concerns monthly precipitation (in mm) over a certain period of time (1907 to 1972) and is interesting for scientists in order to study water cycles. Let us compare the standard and robust estimators of the autocorrelation functions:


It can be seen that, under certain assumptions (e.g. linear dependence), the standard estimator does not detect any significant autocorrelation between lags since the estimations all lie within the asymptotic confidence intervals. However, many of the robust estimations lie outside these confidence intervals at different lags indicating that there could be dependence within the data. If one were only to rely on the standard estimator in this case, there may be erroneous conclusions drawn on this data. Robustness issues therefore need to be considered for any time series analysis, not only when estimating the autocorrelation (autocovariance) functions.

Portmanteau test

In this section we give a brief introduction to Portmanteau tests used in time series analysis. In linear regression, we always need to do diagnostic test for residuals after building our model, to check whether our assumptions are satisfied. If there is no evidence to reject any of the assumptions, we can say that the linear model we built is adequate. Otherwise, the linear models are not adequate, some modifications or transformations need to be done either for the previous model or for the data. This rule also applies to time series modeling. In time series analysis, a wide variety of Portmanteau tests can be used to check the white noise residuals assumption. We will introduce two of them as follows, which are based on the ACF of residuals, in order to illustrate some of ideas of this kind of tests.

Dating back to 1970, Box and Pierce proposed the well-known Box-Pierce test statistic as the following form: \begin{equation} Q_{BP} = n\sum_{h =1}^m \hat{\rho}_h^2, \end{equation} where the empirical autocorrelation of residuals at lag $h$ is defined as $\hat{\rho}h = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)}$. It is obvious that under alternative hypothesis, $\hat{\rho}_h$ would be deviate from $0$, thus a large $Q{BP}$ gives us the evidence to reject the null. And under null hypothesis that the residuals are white noise (or equivalently the time series are adequate), it can be shown that when $n \to \infty$, we have \begin{equation} Q_{BP} \overset{\mathcal{D}}{\to} \chi^2_{m - m^{\prime}}, \end{equation} where $m^{\prime}$ is the number of parameters in the time series model.

Then on 1978, Ljung and Box improved Box-Pierce test by standardizing each $\hat{\rho}h$ by its asymptotic variance. The Ljung and Box test statistic is \begin{equation} Q_{LB} = n\sum_{h =1}^m \frac{n+2}{n-h}\hat{\rho}_h^2. \end{equation} It can also be shown that $Q{LB} \overset{\mathcal{D}}{\to} \chi^2_{m - m^{\prime}}$ under the null. However, compared to $Q_{BP}$, the distribution of $Q_{BP}$ under the null is closer to $\chi^2_{m - m^{\prime}}$, when $n$ is finite.

In the above two examples, the test statistic contains a user specified parameter $m$. And for different $m$, the power of the test would be different. Thus many work has been done to either select the optimal $m$, or propose a new test statistic without user specified parameters. Moreover, testing white noise can also be done by checking PACF, or by checking the spectral density in the frequency domain. Therefore these lead to many different Portmanteau tests.

Take for an example the following use of a Portmanteau test to show the distribution of test statistics under the null:


From the histogram, we can see that under the null the distribution of both BP and LB are close to Chi-square distribution, but LP is slightly better.

To show the distribution of P-values under different alternatives and show that the test depends on specified $m$.




coatless/ITS documentation built on May 13, 2019, 8:45 p.m.